Polynomial Regression

🏠 ⮐ Artificial Intelligence ⮐ Machine Learning ⮐ Supervised Learning ⮐ Regression ⮐

Core Concept

Polynomial regression extends linear regression by including powers (and optionally products) of the features in the design matrix. For a single feature (x), the model might be (y = \beta_0 + \beta_1 x + \beta_2 x^2 + \cdots + \beta_d x^d); for multiple features, polynomial terms include (x_i^2), (x_i x_j), etc., up to a chosen degree. The model remains linear in the parameters—each term is a new “feature” with its own coefficient—so the same least-squares machinery (normal equation or gradient descent) applies. The result is a parametric curve or surface that can capture curvature and non-linear trends while staying interpretable and within the linear-algebra framework of regression.

Key Characteristics

Linear in parameters, non-linear in inputs – Fitting is still minimizing sum of squared errors over a linear combination of basis functions; the “features” are (1, x, x^2, \ldots, x^d) (and cross-terms if multivariate). Coefficients are interpretable as weights on each basis term, though individual (\beta_j) are less directly interpretable than in pure linear regression.
Degree and overfitting – Higher degree allows more curvature and better fit to training data but increases overfitting risk and sensitivity to outliers. Degree acts as a complexity knob; cross-validation or validation-set performance is used to choose degree or to regularize polynomial coefficients.
Numerical stability – Powers of (x) can have very different scales (e.g. (x^{10}) vs (x)); centering and scaling features, or using orthogonal polynomials, improves conditioning of the design matrix and stability of the solution.
Extrapolation risk – Polynomials tend to diverge sharply outside the range of training data; predictions beyond the observed (x) range are often unreliable. Use is best restricted to interpolation or accompanied by clear uncertainty statements.
Basis expansion view – Polynomial regression is a special case of basis expansion; the same idea extends to splines, Fourier terms, or other bases that capture non-linearity while keeping the model linear in parameters.

Common Applications

Trend and growth modeling – Capturing non-linear trends over time or over a single predictor (e.g. quadratic or cubic growth, saturation)
Calibration and response curves – Modeling instrument response or dose–response as a smooth curve when theory suggests a simple polynomial form
Economics and demand – Representing non-linear price–quantity or income–demand relationships with quadratic or higher-order terms
Engineering and physics – Approximating relationships that are known to be smooth and possibly polynomial in a transformed variable
Feature engineering for linear models – Adding (x^2), (x_i x_j), or other polynomial terms to a linear model to capture curvature and interaction without leaving the linear regression framework
Baseline for non-linearity – Testing whether a simple polynomial improves over linear regression before turning to splines, trees, or neural networks

Polynomial Regression Algorithms
Polynomial regression is linear regression applied to a polynomial basis (powers and optionally products of features). The same fitting algorithms used for linear regression therefore apply: the design matrix (\mathbf{X}) is replaced by the expanded matrix of polynomial terms, and OLS, Ridge, Lasso, or iterative solvers are used on that matrix. Choice of algorithm affects stability (e.g. Ridge for ill-conditioned polynomial features), sparsity (e.g. Lasso to drop higher-order terms), and robustness (e.g. Huber on polynomial features).

Ordinary Least Squares (Polynomial) – Minimizes sum of squared errors over polynomial features via the normal equation; same as OLS with (\mathbf{X}) containing (1, x, x^2, \ldots, x^d) (and cross-terms if multivariate). Simple but sensitive to multicollinearity among powers and to numerical scale; centering/scaling or orthogonal polynomials improve conditioning.

Based on: OLS (on polynomial basis expansion)

Method Group: Linear methods

https://scikit-learn.org/stable/modules/linear_model.html#ordinary-least-squares

https://scikit-learn.org/stable/modules/preprocessing.html#polynomial-features

Ridge Regression (Polynomial) – Polynomial basis plus L2 penalty; shrinks coefficients of higher-order terms, reducing overfitting and improving numerical stability; often the default when using polynomial features with many terms or high degree.

Based on: Ridge Regression (on polynomial basis expansion)

Method Group: Linear methods > Regularized regression

https://scikit-learn.org/stable/modules/linear_model.html#ridge-regression

https://scikit-learn.org/stable/modules/preprocessing.html#polynomial-features

Lasso Regression (Polynomial) – Polynomial basis plus L1 penalty; can set coefficients of some polynomial terms to zero, effectively selecting degree or terms; useful for automatic simplification of the polynomial model.

Based on: Lasso Regression (on polynomial basis expansion)

Method Group: Linear methods > Regularized regression

https://scikit-learn.org/stable/modules/linear_model.html#lasso

https://scikit-learn.org/stable/modules/preprocessing.html#polynomial-features

Elastic Net (Polynomial) – Polynomial basis plus L1 and L2 penalties; combines stability under correlated polynomial terms with tendency toward sparse coefficient vectors; useful when many polynomial terms are considered and a parsimonious model is desired.

Based on: Elastic Net (on polynomial basis expansion)

Method Group: Linear methods > Regularized regression

https://scikit-learn.org/stable/modules/linear_model.html#elastic-net

https://scikit-learn.org/stable/modules/preprocessing.html#polynomial-features

Stochastic Gradient Descent (Polynomial) – SGD applied to polynomial features; scales to large (n) when the expanded design matrix is used in mini-batches; supports L2/L1 penalty and robust losses (e.g. Huber) for polynomial fits.

Based on: SGD (on polynomial basis expansion)

Method Group: Optimization-based fitting

https://scikit-learn.org/stable/modules/sgd.html#regression

https://scikit-learn.org/stable/modules/preprocessing.html#polynomial-features

Huber Regression (Polynomial) – Polynomial basis with Huber loss; reduces sensitivity to outliers compared to MSE when fitting curves; useful when the response has heavy-tailed or contaminated errors.

Based on: Huber Regression (on polynomial basis expansion)

Method Group: Linear methods > Robust regression

https://scikit-learn.org/stable/modules/linear_model.html#huber-regression

https://scikit-learn.org/stable/modules/preprocessing.html#polynomial-features